A two-parameter generalized Poisson model to improve the analysis of RNA-seq data

نویسندگان

  • Sudeep Srivastava
  • Liang Chen
چکیده

Deep sequencing of RNAs (RNA-seq) has been a useful tool to characterize and quantify transcriptomes. However, there are significant challenges in the analysis of RNA-seq data, such as how to separate signals from sequencing bias and how to perform reasonable normalization. Here, we focus on a fundamental question in RNA-seq analysis: the distribution of the position-level read counts. Specifically, we propose a two-parameter generalized Poisson (GP) model to the position-level read counts. We show that the GP model fits the data much better than the traditional Poisson model. Based on the GP model, we can better estimate gene or exon expression, perform a more reasonable normalization across different samples, and improve the identification of differentially expressed genes and the identification of differentially spliced exons. The usefulness of the GP model is demonstrated by applications to multiple RNA-seq data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian paradigm for analysing count data in longitudina studies using Poisson-generalized log-gamma model

In analyzing longitudinal data with counted responses, normal distribution is usually used for distribution of the random efffects. However, in some applications random effects may not be normally distributed. Misspecification of this distribution may cause reduction of efficiency of estimators. In this paper, a generalized log-gamma distribution is used for the random effects which includes th...

متن کامل

کاربرد مدل رگرسیون پواسنی تعمیم یافته در تحلیل داده‌های باروری زنان روستایی استان فارس

Background & objectives: statistical modeling explicates the observed changes in data by means of mathematics equations. In cases that dependent variable is count, Poisson model is applied. If Poisson model is not applicable in a specific situation, it is better to apply the generalized Poisson model. So, our emphasis in this study is to notice the data structure, introducing the generalized Po...

متن کامل

Sample Size Calculation of RNA-sequencing Experiment-A Simulation-Based Approach of TCGA Data

Power and sample size calculation is an essential component of experimental design in biomedical research. For RNA-sequencing experiments, sample size calculations have been proposed based on mathematical models such as Poisson and negative binomial; however, RNA-seq data has exhibited variations, i.e. over-dispersion, that has caused past calculation methods to be underor over-power. Because o...

متن کامل

The Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models

Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...

متن کامل

مقایسه کارایی مدلهای رگرسیون پواسن تعمیم یافته با رگرسیون پواسن استاندارد در تحلیل رفتار باروری زنان شهر کاشان درسال 1391

Introduction: Different statistical methods can be used to analyze fertility data. When the response variable is discrete, Poisson model is applied. If the condition does not hold for the Poisson model, its generalized model will be applied. The goal of this study was to compare the efficiency of generalized Poisson regression model with the standard Poisson regression model in estimating the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2010